AITopics | offline dataset

In some applications of reinforcement learning, a dataset of pre-collected experience is already available but it is also possible to acquire some additional online data to help improve the quality of the policy. However, it may be preferable to gather additional data with a single, non-reactive exploration policy and avoid the engineering costs associated with switching policies. In this paper we propose an algorithm with provable guarantees that can leverage an offline dataset to design a single non-reactive policy for exploration. We theoretically analyze the algorithm and measure the quality of the final policy as a function of the local coverage of the original dataset and the amount of additional data collected.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Genre: Research Report (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

ade04fd4f26263f86b47ffb535c4cafb-Paper-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 10:08:33 GMT

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Genre: Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

096b1019463f34eb241e87cfce8dfe16-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 13:53:00 GMT

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Country: Europe (1.00)

Genre: Research Report (0.93)

Industry: Health & Medicine > Therapeutic Area (0.92)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
(2 more...)

Add feedback

07d5938693cc3903b261e1a3844590ed-Supplemental.pdf

Neural Information Processing SystemsApr-24-2026, 13:11:19 GMT

machine learning, probability 1, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

07d5938693cc3903b261e1a3844590ed-Paper.pdf

Neural Information Processing SystemsApr-24-2026, 13:11:16 GMT

artificial intelligence, machine learning, reinforcement learning, (11 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

01d78b294d80491fecddea897cf03642-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 07:38:19 GMT

agent, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Optimistic Critic Reconstruction and Constrained Fine-Tuning for General Offline-to-Online RL

Neural Information Processing SystemsMar-22-2026, 09:39:18 GMT

Offline-to-online (O2O) reinforcement learning (RL) provides an effective means of leveraging an offline pre-trained policy as initialization to improve performance rapidly with limited online interactions. Recent studies often design fine-tuning strategies for a specific offline RL method and cannot perform general O2O learning from any offline method. To deal with this problem, we disclose that there are evaluation and improvement mismatches between the offline dataset and the online environment, which hinders the direct application of pre-trained policies to online fine-tuning. In this paper, we propose to handle these two mismatches simultaneously, which aims to achieve general O2O learning from any offline method to any online method. Before online fine-tuning, we re-evaluate the pessimistic critic trained on the offline dataset in an optimistic way and then calibrate the misaligned critic with the reliable offline actor to avoid erroneous update. After obtaining an optimistic and and aligned critic, we perform constrained fine-tuning to combat distribution shift during online learning. We show empirically that the proposed method can achieve stable and efficient performance improvement on multiple simulated tasks when compared to the state-of-the-art methods.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Genre: Research Report (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Guided Trajectory Generation with Diffusion Models for Offline Model-based Optimization

Neural Information Processing SystemsMar-21-2026, 17:36:55 GMT

Optimizing complex and high-dimensional black-box functions is ubiquitous in science and engineering fields. Unfortunately, the online evaluation of these functions is restricted due to time and safety constraints in most cases. In offline model-based optimization (MBO), we aim to find a design that maximizes the target function using only a pre-existing offline dataset. While prior methods consider forward or inverse approaches to address the problem, these approaches are limited by conservatism and the difficulty of learning highly multi-modal mappings. Recently, there has been an emerging paradigm of learning to improve solutions with synthetic trajectories constructed from the offline dataset.

artificial intelligence, machine learning, trajectory, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.59)

Add feedback

Filters

Collaborating Authors

offline dataset

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

fd489a44f3bcb9f122e4931ef21d0c43-Paper-Conference.pdf

d601a9b708cacfad167f6c6c45647a18-Paper-Conference.pdf

Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data

ade04fd4f26263f86b47ffb535c4cafb-Paper-Conference.pdf

096b1019463f34eb241e87cfce8dfe16-Paper-Conference.pdf

07d5938693cc3903b261e1a3844590ed-Supplemental.pdf

07d5938693cc3903b261e1a3844590ed-Paper.pdf

01d78b294d80491fecddea897cf03642-Supplemental-Conference.pdf

Optimistic Critic Reconstruction and Constrained Fine-Tuning for General Offline-to-Online RL

Guided Trajectory Generation with Diffusion Models for Offline Model-based Optimization